Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
BMC Bioinformatics ; 23(1): 263, 2022 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-35794528

RESUMO

BACKGROUND AND OBJECTIVE: Although rare diseases are characterized by low prevalence, approximately 400 million people are affected by a rare disease. The early and accurate diagnosis of these conditions is a major challenge for general practitioners, who do not have enough knowledge to identify them. In addition to this, rare diseases usually show a wide variety of manifestations, which might make the diagnosis even more difficult. A delayed diagnosis can negatively affect the patient's life. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) and Deep Learning can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments. METHODS: The paper explores several deep learning techniques such as Bidirectional Long Short Term Memory (BiLSTM) networks or deep contextualized word representations based on Bidirectional Encoder Representations from Transformers (BERT) to recognize rare diseases and their clinical manifestations (signs and symptoms). RESULTS: BioBERT, a domain-specific language representation based on BERT and trained on biomedical corpora, obtains the best results with an F1 of 85.2% for rare diseases. Since many signs are usually described by complex noun phrases that involve the use of use of overlapped, nested and discontinuous entities, the model provides lower results with an F1 of 57.2%. CONCLUSIONS: While our results are promising, there is still much room for improvement, especially with respect to the identification of clinical manifestations (signs and symptoms).


Assuntos
Aprendizado Profundo , Doenças Raras , Humanos , Idioma , Processamento de Linguagem Natural , Doenças Raras/diagnóstico
2.
J Biomed Inform ; 125: 103961, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34879250

RESUMO

Rare diseases affect a small number of people compared to the general population. However, more than 6,000 different rare diseases exist and, in total, they affect more than 300 million people worldwide. Rare diseases share as part of their main problem, the delay in diagnosis and the sparse information available for researchers, clinicians, and patients. Finding a diagnostic can be a very long and frustrating experience for patients and their families. The average diagnostic delay is between 6-8 years. Many of these diseases result in different manifestations among patients, which hampers even more their detection and the correct treatment choice. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments, but most NLP techniques require manually annotated corpora. Therefore, our goal is to create a gold standard corpus annotated with rare diseases and their clinical manifestations. It could be used to train and test NLP approaches and the information extracted through NLP could enrich the knowledge of rare diseases, and thereby, help to reduce the diagnostic delay and improve the treatment of rare diseases. The paper describes the selection of 1,041 texts to be included in the corpus, the annotation process and the annotation guidelines. The entities (disease, rare disease, symptom, sign and anaphor) and the relationships (produces, is a, is acron, is synon, increases risk of, anaphora) were annotated. The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases. This could open the door to further NLP applications, which would facilitate the diagnosis and treatment of these rare diseases and, therefore, would improve dramatically the quality of life of these patients.


Assuntos
Qualidade de Vida , Doenças Raras , Diagnóstico Tardio , Humanos , Processamento de Linguagem Natural , Doenças Raras/diagnóstico
3.
J Biomed Inform ; 110: 103539, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32818665

RESUMO

Since the turn of the century, as millions of user's opinions are available on the web, sentiment analysis has become one of the most fruitful research fields in Natural Language Processing (NLP). Research on sentiment analysis has covered a wide range of domains such as economy, polity, and medicine, among others. In the pharmaceutical field, automatic analysis of online user reviews allows for the analysis of large amounts of user's opinions and to obtain relevant information about the effectiveness and side effects of drugs, which could be used to improve pharmacovigilance systems. Throughout the years, approaches for sentiment analysis have progressed from simple rules to advanced machine learning techniques such as deep learning, which has become an emerging technology in many NLP tasks. Sentiment analysis is not oblivious to this success, and several systems based on deep learning have recently demonstrated their superiority over former methods, achieving state-of-the-art results on standard sentiment analysis datasets. However, prior work shows that very few attempts have been made to apply deep learning to sentiment analysis of drug reviews. We present a benchmark comparison of various deep learning architectures such as Convolutional Neural Networks (CNN) and Long short-term memory (LSTM) recurrent neural networks. We propose several combinations of these models and also study the effect of different pre-trained word embedding models. As transformers have revolutionized the NLP field achieving state-of-art results for many NLP tasks, we also explore Bidirectional Encoder Representations from Transformers (BERT) with a Bi-LSTM for the sentiment analysis of drug reviews. Our experiments show that the usage of BERT obtains the best results, but with a very high training time. On the other hand, CNN achieves acceptable results while requiring less training time.


Assuntos
Aprendizado Profundo , Preparações Farmacêuticas , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação
4.
J Biomed Inform ; 99: 103285, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31546016

RESUMO

This work presents a two-stage deep learning system for Named Entity Recognition (NER) and Relation Extraction (RE) from medical texts. These tasks are a crucial step to many natural language understanding applications in the biomedical domain. Automatic medical coding of electronic medical records, automated summarizing of patient records, automatic cohort identification for clinical studies, text simplification of health documents for patients, early detection of adverse drug reactions or automatic identification of risk factors are only a few examples of the many possible opportunities that the text analysis can offer in the clinical domain. In this work, our efforts are primarily directed towards the improvement of the pharmacovigilance process by the automatic detection of drug-drug interactions (DDI) from texts. Moreover, we deal with the semantic analysis of texts containing health information for patients. Our two-stage approach is based on Deep Learning architectures. Concretely, NER is performed combining a bidirectional Long Short-Term Memory (Bi-LSTM) and a Conditional Random Field (CRF), while RE applies a Convolutional Neural Network (CNN). Since our approach uses very few language resources, only the pre-trained word embeddings, and does not exploit any domain resources (such as dictionaries or ontologies), this can be easily expandable to support other languages and clinical applications that require the exploitation of semantic information (concepts and relationships) from texts. During the last years, the task of DDI extraction has received great attention by the BioNLP community. However, the problem has been traditionally evaluated as two separate subtasks: drug name recognition and extraction of DDIs. To the best of our knowledge, this is the first work that provides an evaluation of the whole pipeline. Moreover, our system obtains state-of-the-art results on the eHealth-KD challenge, which was part of the Workshop on Semantic Analysis at SEPLN (TASS-2018).


Assuntos
Mineração de Dados/métodos , Aprendizado Profundo , Registros Eletrônicos de Saúde/classificação , Codificação Clínica , Interações Medicamentosas , Humanos
5.
J Am Med Inform Assoc ; 26(11): 1181-1188, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31532478

RESUMO

OBJECTIVE: The goal of the 2018 n2c2 shared task on cohort selection for clinical trials (track 1) is to identify which patients meet the selection criteria for clinical trials. Cohort selection is a particularly demanding task to which natural language processing and deep learning can make a valuable contribution. Our goal is to evaluate several deep learning architectures to deal with this task. MATERIALS AND METHODS: Cohort selection can be formulated as a multilabeling problem whose goal is to determine which criteria are met for each patient record. We explore several deep learning architectures such as a simple convolutional neural network (CNN), a deep CNN, a recurrent neural network (RNN), and CNN-RNN hybrid architecture. Although our architectures are similar to those proposed in existing deep learning systems for text classification, our research also studies the impact of using a fully connected feedforward layer on the performance of these architectures. RESULTS: The RNN and hybrid models provide the best results, though without statistical significance. The use of the fully connected feedforward layer improves the results for all the architectures, except for the hybrid architecture. CONCLUSIONS: Despite the limited size of the dataset, deep learning methods show promising results in learning useful features for the task of cohort selection. Therefore, they can be used as a previous filter for cohort selection for any clinical trial with a minimum of human intervention, thus reducing the cost and time of clinical trials significantly.


Assuntos
Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Aprendizado Profundo , Redes Neurais de Computação , Seleção de Pacientes , Humanos , Processamento de Linguagem Natural
6.
J Biomed Inform ; 87: 50-59, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30266231

RESUMO

Anaphylaxis is a life-threatening allergic reaction that occurs suddenly after contact with an allergen. Epidemiological studies about anaphylaxis are very important in planning and evaluating new strategies that prevent this reaction, but also in providing a guide to the treatment of patients who have just suffered an anaphylactic reaction. Electronic Medical Records (EMR) are one of the most effective and richest sources for the epidemiology of anaphylaxis, because they provide a low-cost way of accessing rich longitudinal data on large populations. However, a negative aspect is that researchers have to manually review a huge amount of information, which is a very costly and highly time consuming task. Therefore, our goal is to explore different machine learning techniques to process Big Data EMR, lessening the needed efforts for performing epidemiological studies about anaphylaxis. In particular, we aim to study the incidence of anaphylaxis by the automatic classification of EMR. To do this, we employ the most widely used and efficient classifiers in text classification and compare different document representations, which range from well-known methods such as Bag Of Words (BoW) to more recent ones based on word embedding models, such as a simple average of word embeddings or a bag of centroids of word embeddings. Because the identification of anaphylaxis cases in EMR is a class-imbalanced problem (less than 1% describe anaphylaxis cases), we employ a novel undersampling technique based on clustering to balance our dataset. In addition to classical machine learning algorithms, we also use a Convolutional Neural Network (CNN) to classify our dataset. In general, experiments show that the most classifiers and representations are effective (F1 above 90%). Logistic Regression, Linear SVM, Multilayer Perceptron and Random Forest achieve an F1 around 95%, however linear methods have considerably lower training times. CNN provides slightly better performance (F1 = 95.6%).


Assuntos
Anafilaxia/diagnóstico , Anafilaxia/epidemiologia , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Informática Médica/métodos , Redes Neurais de Computação , Algoritmos , Big Data , Análise por Conglomerados , Tomada de Decisões , Humanos , Idioma , Modelos Lineares , Análise de Regressão
7.
BMC Bioinformatics ; 19(Suppl 8): 209, 2018 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-29897318

RESUMO

BACKGROUND: Deep Neural Networks (DNN), in particular, Convolutional Neural Networks (CNN), has recently achieved state-of-art results for the task of Drug-Drug Interaction (DDI) extraction. Most CNN architectures incorporate a pooling layer to reduce the dimensionality of the convolution layer output, preserving relevant features and removing irrelevant details. All the previous CNN based systems for DDI extraction used max-pooling layers. RESULTS: In this paper, we evaluate the performance of various pooling methods (in particular max-pooling, average-pooling and attentive pooling), as well as their combination, for the task of DDI extraction. Our experiments show that max-pooling exhibits a higher performance in F1-score (64.56%) than attentive pooling (59.92%) and than average-pooling (58.35%). CONCLUSIONS: Max-pooling outperforms the others alternatives because is the only one which is invariant to the special pad tokens that are appending to the shorter sentences known as padding. Actually, the combination of max-pooling and attentive pooling does not improve the performance as compared with the single max-pooling technique.


Assuntos
Algoritmos , Interações Medicamentosas , Armazenamento e Recuperação da Informação , Bases de Dados como Assunto , Aprendizado Profundo , Humanos , Redes Neurais de Computação
8.
JMIR Med Inform ; 5(4): e48, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-29196280

RESUMO

BACKGROUND: Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. OBJECTIVE: The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. METHODS: Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. RESULTS: Our experiments show promising results with an F1 of 69% on the test dataset. CONCLUSIONS: To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time.

9.
J Biomed Semantics ; 8(1): 45, 2017 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-28962645

RESUMO

BACKGROUND: Drug Package Leaflets (DPLs) provide information for patients on how to safely use medicines. Pharmaceutical companies are responsible for producing these documents. However, several studies have shown that patients usually have problems in understanding sections describing posology (dosage quantity and prescription), contraindications and adverse drug reactions. An ultimate goal of this work is to provide an automatic approach that helps these companies to write drug package leaflets in an easy-to-understand language. Natural language processing has become a powerful tool for improving patient care and advancing medicine because it leads to automatically process the large amount of unstructured information needed for patient care. However, to the best of our knowledge, no research has been done on the automatic simplification of drug package leaflets. In a previous work, we proposed to use domain terminological resources for gathering a set of synonyms for a given target term. A potential drawback of this approach is that it depends heavily on the existence of dictionaries, however these are not always available for any domain and language or if they exist, their coverage is very scarce. To overcome this limitation, we propose the use of word embeddings to identify the simplest synonym for a given term. Word embedding models represent each word in a corpus with a vector in a semantic space. Our approach is based on assumption that synonyms should have close vectors because they occur in similar contexts. RESULTS: In our evaluation, we used the corpus EasyDPL (Easy Drug Package Leaflets), a collection of 306 leaflets written in Spanish and manually annotated with 1400 adverse drug effects and their simplest synonyms. We focus on leaflets written in Spanish because it is the second most widely spoken language on the world, but as for the existence of terminological resources, the Spanish language is usually less prolific than the English language. Our experiments show an accuracy of 38.5% using word embeddings. CONCLUSIONS: This work provides a promising approach to simplify DPLs without using terminological resources or parallel corpora. Moreover, it could be easily adapted to different domains and languages. However, more research efforts are needed to improve our approach based on word embedding because it does not overcome our previous work using dictionaries yet.


Assuntos
Embalagem de Medicamentos/métodos , Idioma , Semântica , Redação , Algoritmos , Compreensão , Humanos , Assistência ao Paciente/métodos , Espanha
10.
Database (Oxford) ; 20172017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28605776

RESUMO

Drug-drug interaction (DDI), which is a specific type of adverse drug reaction, occurs when a drug influences the level or activity of another drug. Natural language processing techniques can provide health-care professionals with a novel way of reducing the time spent reviewing the literature for potential DDIs. The current state-of-the-art for the extraction of DDIs is based on feature-engineering algorithms (such as support vector machines), which usually require considerable time and effort. One possible alternative to these approaches includes deep learning. This technique aims to automatically learn the best feature representation from the input data for a given task. The purpose of this paper is to examine whether a convolutional neural network (CNN), which only uses word embeddings as input features, can be applied successfully to classify DDIs from biomedical texts. Proposed herein, is a CNN architecture with only one hidden layer, thus making the model more computationally efficient, and we perform detailed experiments in order to determine the best settings of the model. The goal is to determine the best parameter of this basic CNN that should be considered for future research. The experimental results show that the proposed approach is promising because it attained the second position in the 2013 rankings of the DDI extraction challenge. However, it obtained worse results than previous works using neural networks with more complex architectures.


Assuntos
Mineração de Dados/métodos , Interações Medicamentosas , Processamento Eletrônico de Dados/métodos , Modelos Teóricos , Processamento de Linguagem Natural , Redes Neurais de Computação , Máquina de Vetores de Suporte
12.
J Chem Inf Model ; 55(8): 1698-707, 2015 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-26147071

RESUMO

The early detection of drug-drug interactions (DDIs) is limited by the diffuse spread of DDI information in heterogeneous sources. Computational methods promise to play a key role in the identification and explanation of DDIs on a large scale. However, such methods rely on the availability of computable representations describing the relevant domain knowledge. Current modeling efforts have focused on partial and shallow representations of the DDI domain, failing to adequately support computational inference and discovery applications. In this paper, we describe a comprehensive ontology for DDI knowledge (DINTO), which is the first formal representation of different types of DDIs and their mechanisms and its application in the prediction of DDIs. This project has been developed using currently available semantic web technologies, standards, and tools, and we have demonstrated that the combination of drug-related facts in DINTO and Semantic Web Rule Language (SWRL) rules can be used to infer DDIs and their different mechanisms on a large scale. The ontology is available from https://code.google.com/p/dinto/.


Assuntos
Interações Medicamentosas , Bases de Dados de Produtos Farmacêuticos , Humanos , Internet , Semântica , Software
13.
BMC Med Inform Decis Mak ; 15 Suppl 2: S6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26100267

RESUMO

BACKGROUND: Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications. METHODS: In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences. RESULTS: Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48. CONCLUSIONS: The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English.


Assuntos
Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Aprendizado de Máquina , Processamento de Linguagem Natural , Farmacovigilância , Mídias Sociais , Humanos , Disseminação de Informação/métodos , Internet , Idioma
14.
J Cheminform ; 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): S2, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25810773

RESUMO

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

15.
J Biomed Inform ; 51: 152-64, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24858490

RESUMO

The DDIExtraction Shared Task 2013 is the second edition of the DDIExtraction Shared Task series, a community-wide effort to promote the implementation and comparative assessment of natural language processing (NLP) techniques in the field of the pharmacovigilance domain, in particular, to address the extraction of drug-drug interactions (DDI) from biomedical texts. This edition has been the first attempt to compare the performance of Information Extraction (IE) techniques specific for each of the basic steps of the DDI extraction pipeline. To attain this aim, two main tasks were proposed: the recognition and classification of pharmacological substances and the detection and classification of drug-drug interactions. DDIExtraction 2013 was held from January to June 2013 and attracted wide attention with a total of 14 teams (6 of the teams participated in the drug name recognition task, while 8 participated in the DDI extraction task) from 7 different countries. For the task of the recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was an F1 of 65.1%. The results show advances in the state of the art and demonstrate that significant challenges remain to be resolved. This paper focuses on the second task (extraction of DDIs) and examines its main challenges, which have yet to be resolved.


Assuntos
Bases de Dados de Produtos Farmacêuticos , Interações Medicamentosas , MEDLINE , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Publicações Periódicas como Assunto , Vocabulário Controlado , Mineração de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Farmacovigilância , Semântica , Software
16.
J Biomed Inform ; 46(5): 914-20, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23906817

RESUMO

The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEval 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%. These results show that the corpus has enough quality to be used for training and testing NLP techniques applied to the field of Pharmacovigilance. The DDI corpus and the annotation guidelines are free for use for academic research and are available at http://labda.inf.uc3m.es/ddicorpus.


Assuntos
Interações Medicamentosas , Guias como Assunto
17.
J Biomed Inform ; 44(5): 789-804, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21545845

RESUMO

A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. Information Extraction (IE) techniques can provide health care professionals with an interesting way to reduce time spent reviewing the literature for potential drug-drug interactions. Nevertheless, no approach has been proposed to the problem of extracting DDIs in biomedical texts. In this article, we study whether a machine learning-based method is appropriate for DDI extraction in biomedical texts and whether the results provided are superior to those obtained from our previously proposed pattern-based approach. The method proposed here for DDI extraction is based on a supervised machine learning technique, more specifically, the shallow linguistic kernel proposed in Giuliano et al. (2006). Since no benchmark corpus was available to evaluate our approach to DDI extraction, we created the first such corpus, DrugDDI, annotated with 3169 DDIs. We performed several experiments varying the configuration parameters of the shallow linguistic kernel. The model that maximizes the F-measure was evaluated on the test data of the DrugDDI corpus, achieving a precision of 51.03%, a recall of 72.82% and an F-measure of 60.01%. To the best of our knowledge, this work has proposed the first full solution for the automatic extraction of DDIs from biomedical texts. Our study confirms that the shallow linguistic kernel outperforms our previous pattern-based approach. Additionally, it is our hope that the DrugDDI corpus will allow researchers to explore new solutions to the DDI extraction problem.


Assuntos
Interações Medicamentosas , Armazenamento e Recuperação da Informação/métodos , Software , Algoritmos , Inteligência Artificial , Processamento de Linguagem Natural
18.
BMC Bioinformatics ; 12 Suppl 2: S1, 2011 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-21489220

RESUMO

BACKGROUND: A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. METHODS: This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. RESULTS: We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. CONCLUSIONS: Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.


Assuntos
Interações Medicamentosas , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Inteligência Artificial , Linguística
19.
J Biomed Inform ; 43(6): 902-13, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20688192

RESUMO

Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.


Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Descoberta de Drogas , Algoritmos , Preparações Farmacêuticas/química , Unified Medical Language System
20.
BMC Bioinformatics ; 11 Suppl 2: S1, 2010 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-20406499

RESUMO

BACKGROUND: Drug-drug interactions are frequently reported in the increasing amount of biomedical literature. Information Extraction (IE) techniques have been devised as a useful instrument to manage this knowledge. Nevertheless, IE at the sentence level has a limited effect because of the frequent references to previous entities in the discourse, a phenomenon known as 'anaphora'. DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction. METHODS: The system uses a set of linguistic rules drawn by Centering Theory over the analysis provided by a biomedical syntactic parser. Semantic information provided by the Unified Medical Language System (UMLS) is also integrated in order to improve the recognition and the resolution of nominal drug anaphors. Besides, a corpus has been developed in order to analyze the phenomena and evaluate the current approach. Each possible case of anaphoric expression was looked into to determine the most effective way of resolution. RESULTS: An F-score of 0.76 in anaphora resolution was achieved, outperforming significantly the baseline by almost 73%. This ad-hoc reference line was developed to check the results as there is no previous work on anaphora resolution in pharmacological documents. The obtained results resemble those found in related-semantic domains. CONCLUSIONS: The present approach shows very promising results in the challenge of accounting for anaphoric expressions in pharmacological texts. DrugNerAr obtains similar results to other approaches dealing with anaphora resolution in the biomedical domain, but, unlike these approaches, it focuses on documents reflecting drug interactions. The Centering Theory has proved being effective at the selection of antecedents in anaphora resolution. A key component in the success of this framework is the analysis provided by the MMTx program and the DrugNer system that allows to deal with the complexity of the pharmacological language. It is expected that the positive results of the resolver increases performance of our future drug-drug interaction extraction system.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Interações Medicamentosas , Farmacologia/métodos , Software , Bases de Dados Factuais , Humanos , Semântica , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...